ED 283 833 



TM 870 322 



AUTHOH 

TITOS 



PUB DATE 
KOTE 



PUB T¥PE 



EDRS PRICE 
DliCRIPTORS 



IDENTIFIERS 



HSLwmo, Ronald Figge, Fred L. 

A Stata-Hida Agseesmant of the Tasting and Evaluation 
Heads and Proficianaies of Baginning Taacharsi 
Implicationi for Staff Oavalo^ant. 
Mar 87 

30p^l Paper praaantad at tha Annual Haating of tha 
Association of Supervision and CurricultuQ Devalopmant 
(New Oriaans, LA, Mareh 21^24, 1987). 
Speeehas/Confaranca Papers (ISO) — Reports - 
Resamreh/Teohnical ( 143 ) 

lff01/PC02 Plus Postaga, 

^Administrator Attitudas; ^Beginning Teacharsi 
Elementary Sehool Teachers i Elementary Sacondary 
Education I *Naads Assessmanti Rating Scales | 
Sacondary School Teachers i State Surveys | *Taachmr 
Evaluationi ^Teacher Mada Tests i ^Taaching Skills- 
Test Constructionn Training Objectives 
Ohio 



ABSTRACT 

In order to further agisting knowladge about 
classroom testing practices, this study surveyed Ohio supervisors* 
and principals Vparceptions of beginning teachers' needs for and 
proficiency in salected tasting and evaluation competencias, 
Raspondent0 considered typical beginiiing elemantary and sacondary 
taachers, rather than taachers of special aducation, music, aL't, or 
physical education^ Usable survay responses to the 26^itam rating 
scale weraraturned by 229 taachar supervisors, 313 building 
principmls, and 44 curriculum or instruction coordinators « Four 
hypothesas were a:^aminedv First , beginning teachers- naeds and 
prof iciancies ara similar, indicating thay are wall t^minad in 
testing and evaluation compatencies, spacif ically test construction 
and tast score use. Si^cond, levels of training are equal for 
difJarent grada levels and for rural, urban, or suburban schools. 
Third, testing competancies are as high or higher than teachers* 
reported subject knowledge^ professional education competencies, and 
overall competencias. Fourth, supervisors and principals will agree 
on baginning teachers* naads, as wall as on thair prof iciancias. 
Results indicated the followingi (1) the teachers* prof iciencies wara 
inadequate to meet tha job needs; (2) hypothesis 2 was supported^ (3) 
hypothesis 3 was rajactedi and (4) principals and suparvisors agroad 
on teachers evaluation compatancy needs but disagraad on 
liciancies* (GDC) 



* Reproductions supplied by EDRS are tha best that can be mada * 

* from the original docwant. * 



EKLC 



'"f*^ . State-Wide Assessment 

1 



A State-Wlda Assessment of the Tasting and Evaluation 
f; Needs and Proficiencies of Beginning Teachers: 

iD^lications for Staff Development 



Ronald Harso 
Fred Pigge 
Collage of Education and Allied Professions 
Bowling Green State Univarsity 
Bowling Green, Ohio 43403 



A paper presented at the annual meeting of the 
Association of Supervision and Curriculum Development 

New Orleans, M 
March 21-24, 1987 



ci 



Running head: STATE-WIDI ASSESSMENT 



ERIC 



"PiRMISSlON TO REPRODUCI THIS 
MAT! RIAL HAS BIEN QRANTID BY 




TO THE EDUCATIONAL RlSOURQiS 
INFORMATION CENTER (iRIC)." 



U.S. PIPAm'MENTeF EDUCATION 

f DUGATlQf4AL RESOURCfS INFORMATION 
CENTER (f RIP) 

^^^hift dO€umsnl has been rsprfidueed as 
recsiv^ ffsm the person or sfganization 
orlginaiing it 

P Minsr changes havs ^sn made to improva 
rsproductiofl quality. 

• Fslnla sf view pr splniPns slated in this docy- 
mani do not noceisarily reprssant official 
Of Rl paaitisn er pellEy, 



BEST WPY 




State-Wide Assessment 

2 

A State-Wide Assessment of the Testing and Evaluation 
Needs and Proficiencies of Beginning Teachers: 
In^lications for Staff Development 
Despite the fact that teacher-made tests occupy far more 
classiroDm time and effort than do standardized tests, standardized 
tests have won the attention and interest of researchers, media, 
and the general public (Coffman, 1971| Fleming & Chambers, 1983), 
However, the interests of the general public in standardized 
testing can best be described ag running sweet and sour much to 
the chagrin of the measurement profession. The current strident 
demands for increased educational accountability and higher 
educational standards via testing have followed so nlosely on the 
heels of damandH for standardised testing moratoriums that one 
hears echos of cries of testing social injustice and discrimination 
concomitant with cries of elitism (Green, 1975; Madaus, 1985)* 
Nevertheless, public attention and research interest have 
facilitated a large and growing body of knowledge about standardized 
tests and testing; whereas, research on teacher-made tests and 
testing practices in the public school classrooms has largely been 
neglected. Further, the limited research on teacher-made tests 
and their use is restricted in scope by a preponderance of studies 
having been conducted in college classroi>ms and having been 
primarily limited to investigations of test reliability and test 
item characteristics. The gravity of this situation is such that 



3 

o 

ERIC 



State-Wide Assessment 

3 



Dwyer (1982) maintained that the advice given to preservice and 
inservica teachers regarding the use of teacher-made tests in the 
public schools reflected a concensus of professional judgments 
rather than a foundation of empirical research. Bimilarly, 
Gullickson (19843 stated that we do not know whether classroom 
tests are used effectively or how they are used. 

Some research that has been conducted in the public schools 
does provide a few suggestions about the effects of certain testing 
practices although we do not know whether or not teachers are 
using these practices. Further, these suggestions appear to be 
consistent with those suggestions derived from the earlier 
investigations conducted on college-age subjects (Balchj 1964). 
For eKan^le, recent research in the public school classrooms 
suggests I that students prepare differently for varied test item 
types (D'Ydewalle, Swerts, & DeCorte, 1983 | Kulhavy, Dyer, & 
Silver J 1975), that students have preference for certain test item 
types (Shaha, 1984), that certain types of feedback following a 
test enhances learning and other types apparently do not (Hannai 
1976; Stewart a White, 1976| WeKley & Thornton, 1972), that the 
frequent administration of tests designed to facilitate learning 
tends to do so (Peckham & Roe, 1977), and that time spent in 
testing may be more efficient in promoting learning than comparable 
time spent in reviewing content (Nungester & Duchastel, 1982). 



State-Wide Assesiment 

4 



In reporting on one of the few investigations of testing 
attitudes and practices in the public school classrooms. 
Qulllckeon (1984) described the existing research on teacher-made 
tasting practices as limited and idiosyncratic. She conducted a 
state-wide survey of third, seventh, and tenth-grade teachers 
regarding their attitudes toward the effects of classroom tests on 
student-learning, constraints associated with testing Ce,g, 
availability of scoring assistance), effectiveness of tests as an 
evaluative tool (e.g. do tests facilitate instruction?), and 
testing practices and beliefs (e.g. students dislike taking tests). 
Most of the teachers in her sample reported having taken a college 
measurement class. She concluded that teachers are very supportive 
of the use of classroom tests, are Comfortable with their knowledge 
about and use of teacher-made tests, feel that tests should be 
given frequently, and feel that tests are helpful in the instructidnal 
process but that they have limited evaluative usefulness, Lambert 
(1980-81) obtained a nationwide sample of opinions about standardized 
and teacher-made tests and testing practices in the public schools 
from chairpersons of state legislative c .ittees, from principal 
officials in state teacher associations, and from the deans of the 
three largest teacher training institutions in each state. He 
found both agreement and divergent opinions within and between the 
three groups sampled. Widely divergent opinions were identified 
on matters such as whether or not teacher training institutions 



5 



Sta? e*- Ass ^^sment 

5 



should of far instruction In tasti and meafsa^* mtj, ts ^4,g, oua-third 
of the dmmnm reported that their colleger da j not cijcr auch a 
course and had no intention of d&ing mo) , ^^thf^T i> not 
nDrm-referanced tests should be used for et^i it nal program 
evaluation^ whether or not eicisting standkt.yi lasts are biag^d, 
and whether or not multiple-choice tests really li^sess the 
Gon^^etence of Glassroom teachers, Contrarilyj he found considerable 
agreement on opinions such as classroom teachers have a generally 
negative attitude about standardized tests, the inqportance of 
teachers producing superior classroom tests ^ and that teachers 
should know more about standardized tests and their use. Lambert 
general conclusion was that there is an apparent need to make all 
three groups used in his nationwide survey more aware of both the 
values and limitations of tests, 

A single study was identified which specifically addressed 
the overall nature of teacher-made testing practices in the public 
schools. Rogers (1985) had each of 89 university students 
registered for a tests and measurements class conduct an 
open*ended interview of an inservice public school teacher 
regarding the pupil evaliiatlon process inclusive of test planning ^ 
test construction^ testing for instruction, grading of students, 
and use of standardized tests. The inductive data analysis 
en^loyed to inte^ret the reports of the university students 



e 



State«Wide Assessment 

6 

following the opan-endad interviews rasultad in the following 
generali^ationei most of the teachers interviewed used paper and 
pendil tests p most used both self-constructed and publisher-made 
tests, most planned tests around curriculum guide objectives 
rather than test specification tables , most of the teachers used 
the percentage correct method of scoring j and the teachers varied 
considerably in the extent of their use of standardised test 
scores . 
Purpose 

The general purpose of this state-wide survey was to further 
existing knowledge about classroom testing practices through 
investigating supervisors' and principals' perceptions of beginning 
teachers* need for and proficiency in selected classroom testing 
and evaluation con^etency areas* The administrators were asked to 
use *'t3?pical" beginning elementary ^r secondary content teachers 
as frames -of -reference when con^pleting the survey fom. In other 
words, they were asked not to consider the testing and evaluation 
needs and proficiencies of beginning teachers in the special 
education and specialized Cmusic, art, physical education) areas* 
Also, they were asked to consider the "typical" beginning 
elementary or content area teacher, not the best nor the poorest, 
Just those teachers in the "middle" who might be classified as 
"t:^ical," 



7 



Stata-Wide AssasBment 

7 

The following four general hypotheses guided the InveEtlgatlon 

1) The supervisars and principals will report that beginning 
teachers are well trained relative to the conqpetencies needed 
for claseroom testing and evaluation. Specifically, the 
principals' and supervisors' ratings of the extent that 
con^etencies are needed will not differ significantly from 
their ratings of the proficiency of beginning teachers fori 

a) test development competencies, or b) test score use 
coi^etencles , 

2) Tha supervisors and principals with different grade level 
and school type assigiuaents will report that beginning 
teachers are equally wall trained in testing and evaluation 
coarpetencles , Specifically, the ratings of beginning teachers' 
testing proficiencies and needs will not differ significantly 
when the rating supervisors or principals are assigned to^ 

a) elementary as compared to middle or high school grades, 
and b) rural as coo^ared to urban or suburban Bchools, 

3) The supervisors and principals will report that beginning 
teachers* coispetencies In classroom testing and evaluation 
are equivalent to the level of their other professional 
competencies. Specifically, the supervisors and principals 
will rate the beginning teachers' testing related con^etencies 
as high or higher thani a) knowledge of their subject areas, 

b) their other professional education coorpetencies , and 

c) their overall con^etencies as educators. 



8 



State-Wide Assessment 

8 



4) The supervisors and principals will be in close agreement 
about beginniag teacbsrs' testing and evaluation needs and 
proficiencies. Specifically, the supervisors' ratings of 
beginning teachers' con^etencies when compared to the 
principals* ratings will not differ significantly for: 
a) needs, or b) proficiencies* 

Method 

A survey instrument was constructed and sent during the 
winter of 1986 to a stratified random san^le of supervisors of 
teachers and building principals in Ohio* The names and addresse/i 
of the subjects were selected from the State directory of schools* 
The type of school system (city, exempted village, or coimty 
local), the job assignment (principal or supervisor), and school 
grade level (elementary, middle, or secondary) classifications 
were used as strata In the random selection process. A total of 
SCO survey forms were mailed from which 586 (73%) usable survey 
responses ware obtained after two follow-up contacts of nonrespondents , 
A total of 229 supervisors, 313 building principals, and 44 
individuals in related supervisory roles (coordinators of curriculum 
or instruction, etc.) returned usable and coo^leted survey forms « 

The j^urvey Instrument consisted of a 26-item listing of 
con^etencies related to the development and use of teacher-made 
tests* These items were selected and reviewed for appropriateness 
by a team of five professors responsible for the instruction of 



9 



State-Wide Apsessment 

9 

the tests and measurementi course for preservice teachers at 
Bewiing Green State University, The items then were grouped into 
two sections of the survey instrument with 17 items identified as 
t§^ development con^ete©cies and nine items identified as test 
use competencies. Two five-point C*5' as high and '1' as low) 
Likert-type response scales were provided for each competency item 
and identified mmi "need of this con^etency to be a successful 
teacher in your school" and "average proficiency of your new 
teachers in this competency." Each respondent was also asked to 
indicate the nature of his/her schoolCs) assignment (rural , urban, 
or suburban) and the grade level of his/her assignment (elementary, 
middle grades, secondary, K-12 grades, or other). Those respondents 
placing themselves in the "other" category were excluded from the 
analyses related to specific school assignments. Additionally, 
the respondents were asked to rate thm preparation of their tTOi^^l 
beginning teachers in tests and evaluation conqpetencies via three 
Llkert-type five-point scale Items (*1* much below avarage to '5' 
well above average) relative to^ the beginning teachers' subject 
area knowledge, the beginning teachers' knowledge and skill in 
other professional education competencies (planning, discipline, 
etc), and the beginning teachers' overall (general) coD^etencles 
as educators « 



10 



State-Wide Assessment 

10 

Results 

H 3rppthasis One: Level of Needs Versus Prof iciencies 

A t test of the difference between dependent means was used 
to analyse the CQmbined supervisor and principal ratings CN^5S6) 
of the beginning teachers' need for and profiGiency in each 
identified competency, A t test was co^leted between the need 
mean and the proficiency mean for each of the 17 test development 
competency items and for each of the nine test score use 
competency items | alsOj t-ratios were completed on the totals for 
each of the two sections. 

The t test analysis procedures resulted in the rejection of 
hypothesis one 'a* and *b' as significant differences (p < ,001) 
between the need and proficiency mean ratings were noted for each 
of the 26 competency items* Descriptions of the items, need and 
proficiency means, t-ratios, and other data related to these 
analyses are presented in Tables 1 and 2, For each of the 26 
competency items the combined group of supervisors and principals 
rated the mean need for the competency significantly higher than 
they rated the typical beginning teachers* proficiency in that 
competency area. This would suggest that the supervisors and 
principals felt that the typical beginning teachers' test 
development and test score use proficiencies were inadequate to 
meet the needs of their jobs. 



11 



State-'Wide Assessmant 

11 

To better identify which beginning teacher coa^atencies the 
respondents reported as being most deficient, a discrepancy index 
was calculated for each item (need mean minus proficiency mean) 
and each item was then ranked relative to this disGrepancy indeK 
(see Table 1 and Table 2), The three items with the highest 
discrepancy "scores" for the test development competencies (see 
Table 1, items 9j 10 , and Sb) were closely related to the i^act 
of tests on pupil learning: writing questions demanding higher 
thinking processes, writing questions representing true student 
progress, and the scoring of essay questions. The three items 
with highest discrepancy "scores" for the test score use competencies 
(see Table 2, items 4, 9* and 5), similarly, were all associated 
with the use of tests to ii^rove learning. Conversely, the items 
with lowest discrepancy scores on both sets of competency areas 
appeared to be skills (math calculations^ grading, writing items, 
selecting items* or use of scciometric techniques) less directly 
related to the instructional-learning process. 
Hypothesis Two^ Needs and Proficiencies by Grade and School 

A series of F tests were used to analyze the ratings of the 
principals and supervisors when classified by grade level 
assignment (elementary * middle school, or secondary school) or 
school assignment Crural^ urbanj or suburban). These analyses 
were coi^leted on the total rating scores for the combined 17 
items in the test development and the combined nine items in the 
use of test scores sections of ths questionnaire. 



12 



State-Wide Assessment 

12 



H3npothesis number two was accepted as no significant 
differences were identified for either the test development or tbe 
use of test scores sets of competencies for either the grade level 
or the school assignment classifications of the respondents. The 
need rating means for the set of 17 test development competencies 
for the grade level classification werei elementary 68.86, middle 
68. 31 J and secondary 67.94 CF-0.46, p^,63). The proficiency 
rating means for this same set of competencies and classification 
were: elementary 50,45, middle school 49.83, and secondary 49,39 
(I'-l*07, p=*34). The need means for the set of nine test score 
use con^etencles for the grade level classification werei 
elementary 37,46, middle 37-34, and secondary 36.95 (F-O^Sl, 
p^.60). The total proficiency means for this same set of 
con^etencies and classification were: elementary 27,55, middle 
26,91, and secondary 26*91 CF-1,14, p-,32). 

The need means for the set of 17 test development competencies 
categorized by type of school assignment were: rural 68,37, urban 
68.04, and suburban 68.34 CF^O.03, p^.97). The proficiency rating 
means for this same questionnaire section werei rural 49,84, 
urban 49 •4&, and suburban 48,94 CF^0,82, p^.44). The need means 
for the test score use section of the questionnaire were: rural 
36,94, urban 37.10, and suburban 37,42 CF=0.52, p-.60). The 
proficiency means for this same section werei rural 27.29, urban 
26,56, and suburban 26.64 CF^1.40, p-.2S). 



13 



State-Wide Assessnient 

13 



The lack of significant mean differences among the competency 
ratings when classified by type of school assignment or by grade 
level assignmant would suggest that the raters were consistent in 
their ratings of beginning teachers ^ that beginning teachers were 
seen as having similar levels of proficiencies despite different 
grade level or school assignments^ and that the principals and 
supervisors perceived testing and evaluation Job needs as being 
similar for varied school or grade settings, 

H3rpothesis Three i Testing Versus Other Professional Competencies 

In the third section of the questionnaire, the respondents 
were requested to make an "overall assessment of the preparation 
of typical beginning teachers in coB^etencies related to tests and 
evaluation," Three Items were provided in this section requiring 
the principals and supervisors to rate the testing and evaluation 
con^etencies of beginning teachers relative to= knowledge of 
their subject areas, their other professional education competencies, 
and their overall co^etencies as educators* Each of these three 
items had a response scale from one to five, respectively: 
(1) much below average ^ (2) somewhat below average p (3) about 
average, (4) somewhat above average, and (5) well above average. 

The responses to this section of the questionnaire were 
analysed by various grade levels and types of school assignment 
for the total group of respondents and by supervisor as compared 
to principal ratings, ^en the total group of raspondents were 



14 



State=Wide Assessment 

14 



classified by grada assignment of the raters C^^lementary , middle, 
or secondary schools) and by type of school (s) the raters were 
assigned to Crural, urbane or suburban), no significant mean 
differences within either of the two groups were identified. 
However, the principals' mean rating as compared to the supervisors* 
mean rating were significantly different for each of the three 
Items, as indicated in Table 3. The item rating means for each of 
the three items for these two groups of raters were as follows i 
knowledge of subject area, principals 3.03, supervisors 2*87 
(t— 2.47, p— .01); other professional education competencies, 
principals 2,96, supervisors 2.81 Ct^2*34, p^.02)| and overall 
competencies as educators, principals 2,93 and supervisors 2.73 
(t— 3.34, p— .001), On each item the supervisors' mean rating of 
the beginning teachers' competencies was lower than the principals* 
mean rating. The item score means for the total group of 
respondents (principals plus supervisors) on each of the three 
items were, respectively, 2.95, 2.89, and 2*84. 

Hypothesis three was rejected as item rating means for the 
principal, supervisor, and the total group of respondents were 
below average (below 3.0) for eight of the nine rating means. 
Thus , it is evident that these principals and supervisors perceived 
beginning teachers as being less competent in testing and evaluation 
skills as con^ared to their knowledge and skills In other ar^as. 



State'-Wide Assessment 

15 



Hypothesis Fouri Comparison of Principals' and Supervisors* Ratings 

A series of indepandent t tests were used to determine whether 
or not the supervisors and principals differed significantly in 
their ratings of beginning teachers* testing and evaluation needs 
and proficiencies in each of the various competency areas , The 
results of these analyses for the 17 test development competencies 
and for the nine test score use coD^etencies are presented on 
Table 4 and Table 5, respectively. 

The coi^arisons of the principals' ratings to the supervisors' 
ratings of beginning teacher needs revealed no significant mean 
difference for the combined 17 test development competencies 
(principals 68.16 and supervisors 68.63, t ^ 0.61^ p ^ .54) or for 
the combined nine test score use con^etencies (principals 37.30 
and supervisors 37.02, t ^ 0,33^ p - .57). Further^ the comparisons 
for each Individual coE^etency item rf^^iulted in the identification 
of only three significant mean differences (p < .05) among the 26 
need items from the two sections of the questionnaire. This 
suggests a high level of agreement between these two groups of 
raters about the testing and evaluation needs of beginning teachers . 
Of the three -'need" items revealing a significant mean difference 
between the two groups^ two of the items were rated higher by the 
principals as compared to the supervisors. These two items werei 
calculating end of term grades, means of 4,04 and 3.87, respectively 
(Table 5, t^2.17, p^.03)| and deciding the importance of tests and 



16 



State==W±de Assessment 

16 



papers, manns of 4,25 and 4.11 respectively (Table 5, t^2.07j 
p-,04). The third item, use of less formal aBsessments, was rated 
as a higher need by the supervisor group: means of 3.70 and 3*54, 
raspectively (Table 4, t=1.98, p-,05). 

The series of coi^arisons between the principals' and 
supervisors* ratings of beginning teachers' prof icienGies revealed 
a significant mean difference for the combined 17 test development 
competencies and for the combined nine test score use con^etencies , 
These mean differences, respectively, werei principals 50.74 and 
supervisors 47.81 (F ^ 10,91, p ^ ,001), and principals 27.52 and 
supervisors 26.32 (F = 5.47, p - .02), Additionally, comparisons 
for individual competency items resulted in 15 significant mean 
differences among the 26 items. Each of these identified 
significant differences revealed a pattern of higher ratings of 
beginning teachers' testing and evaluation proficiencies by the 
building principals as con^ared to the supervisors* 

Even though the principals tended to rate the beginning 
teachers' proficiencies higher than did the supervisors, it is 
evident from examining the relative item rating magnitudes (ranks) 
within both sets of proficiency items that the two groups of 
raters were in rather high agreement about the relative levels of 
proficiencies. In other words, the principals and supervisors 
were in high agreement about which proficiencies of the beginning 
teachers were relatively higher or lower as compared to the total 



EKLC 



17 



State^Wide A^sassment 

17 



s^ts of proficiencies. This was also true of the needs ratings of 
the two groups of administrators. The Spearman CRho) correlation 
coefficiente presented on the last lines of Tables 4 and 5 indicate 
a very high agreement Crank order coefficients of ,92 or higher) 
between the various sets of principals* and supervisors* need or 
proficiency rating means for both the test development and test 
score use competencies. 

The principals rated beginning teacher proficiencies 
significantly higher than the supervisors on individual items one 
through 11 of the 17 test development proficiency items (see 
Table 4)* Similarly^ the principals rated the beginning teachers 
significantly higher on items twOj four^ and eight (interpreting 
scores, reteaching needs j and guiding learning) of the nine test 
score use proficiencies (see Table 5). 

In summation relative to h3^Gthesis fourj the comparison of 
principais' and supervisors* ratings resulted in the acceptance of 
hypothesis four *a% as the principals and supervisors generally 
agreed on the relative need for the various beginning teachers * 
test development and test score use competencies and in the 
rejection of hypothesis four 'b^^ as the principals and supervisors 
significantly differed in their ratings of beginning teachers* 
proficiencies in both sections of the questionnaire* This data 
indicate that the principals rated beginning teachers as having 
higher test development and test score use proficiencies than did 



18 



State-Wide AsEessment 

18 



the supervisors s but the two groups of raters were in general 
agreement about the relative Cseparate rank orders) proficiency 
levels of the beginnini teachers within the set of 26 cDrapetencies 

Summary and Discussion 
The analyses of the data collacted resulted in the rejection 
of hypothesis one as the combined supervisors and principals* 
ratings of the beginning teachers' needs were significantly higher 
than their ratings of beginning teachers* proficiencies for each 
of the 26 test and evaluation competency areas. This would 
suggest that the total group of respondents viewed beginning 
teachers* proficiencies in the area of tests and evaluation to be 
less than adequate in terms of typical job needs* It might 
suggest also that those professionals responsible for inservice 
and preservice teacher training should give more attention to 
testing and evaluation skills development. 

H^othesls two was accepted as the combined supervisor and 
principal respondents grouped by different types of schools 
(rural J urban, or suburban) or by grade level assignments 
Celementary, middle, or secondary grades) did not significantly 
vary in either their ratings of teachers* testing and evaluation 
competency needs or their ratings of beginning teachers* testing 
and evaluation proficiencies. This would suggest that testing and 
evaluation needs or proficiencies do not vary greatly from grade 
to grade or school to school and that inservice training sessions 



19 



State-'Wide Assessment 

19 



might includa various school and grade level personnel without 
being detrimental to the learning process. 

Hypothesis three was rejected as the combined principal and 
supervisor respondents did not rate beginning teachers* test and 
evaluation skills to be as high or higher than their knowledge of 
subject areas, as high or higher than their other professional 
education coupetencles, or as high or higher than their overall 
competencies as educators. This might suggest that beginning 
teachers* testing and evaluation skills are less well developed 
than their other professional skills, and it further might confirm 
that preserviee and inservice trainers of teachers ought to give 
more attention to testing and evaluation skill development. 

Lastly, hypothesis four *a* was accepted but four 'b* was 
rejected. The separate principals- and supervisors* ratings 
revealed a high degree of agreement between these two groups in 
rating the needs of beginning teachers for the various test 
development and test score use competencies. However, these two 
groups of raters differed significantly in their ratings of 
beginning teacher proficiencies in the various test development 
and test score use co^etencles. Generally, the principals rated 
the proficiencies of begiming teachers higher than did the 
supervisors. Aether this difference in rating levels is a 
consequence of differences in opportunities to observe beginning 
teachers or of differences in relative rating tendencies of the 



20 



State^Wide Assessment 

20 



two groups of respondents could not be determined from the data. 
The latter might be considered less likely, however , as the two 
groups rated needs in very similar manner. 

Overall, the data collected suggested considerable agreement 
among the principals and supervisors in their relative tatings of 
typical teachers' needs and begiiming teachers' proficiencies on 
the 26 identified testing and evaluation competency Areas, The 
relative mean rating magnitudes within each group of co^etencies 
for both the principals and supervisors were very similar 
(Spearman Rho*s of .92 and higher). This consistency along with 
rating stability found across respondent grade and school 
assignments J and the very high consistency in principals' and 
supervisors- mean ratings of needs would appear to encourage one's 
confidence in this data. 

The magnitude of the discrepancy scores (need mean minus 
proficiency mean) for each of the 26 testing con^etencies should 
provide those concerned about either inservice or preservice 
teacher training with a practical guide in designing content for 
such training programs Ctrue, this sa^le was limited to a single 
state I but it is a populous state which employs beginning teachers 
trained in many other states). It would appear that teacher 
trainers might wish first to address competencies associated with 
highly rated needs but with large discrepancy scores* Rather 
specif icallyj this set of data would suggest that inservice or 



21 



state-wide Assessment 

21 

preservice training might bast emphasi^a the use of tests and 
scores for reteaching of content ^ guiding student learning, and 
positively influencing study and learning. Further, this data 
would suggest that more practice be given in skills such as stating 
maasurabla Dbjectives, writing items that assess true student 
progress^ writing items that measure highar thinking processes, 
and in the writing and scoring of essay items* 



22 

o 

ERIC 



State^Wide Assessment 

22 

References 

Balch, J. (1964). The influence of the evaluating instrument on 
students* learning. American Educa tional Research Journal . 6, 
169-182. 

Coffman, 1* (1971). Essay eMminations. In R. L. Thorndike 
(EdO, Educational measurement C2nd ed, , pp. 271-302), 
Washington j D*C.^ ^erican Council on Education, 
Dv^er, C, A. (1982). Achievement testing. In H. E, Mitzel 

(Ed,), Encyclopedia of educational research C4th ed.. Vol, 1^ 
pp. 13-22). New York: The Free Press. 
D*Ydewalle, , Swerts, A., ^ DeCorte, E. (1983), Study time and 
test performance as a function of test expectations. 
Contemporary Educational Psychology , 8* 55-67, 
Fleming, M, & Chambers, B. (1983), Teacher-made tests: Windows 
on the classroom. New directions for testiag and measurement , 
19, 29-38, 

Green, R, L, (1975), Tips on educational testing: V^at teachers 

and parents should know. Phi Delta Kappan , October, 89-^93. 
Gullickson, A. R. (1984), Teacher perspectives of their instructional 

use of tests. Journal of Educational Research , 77, 244-248, 
Ha^a, G, S, (1976), Effects of total and partial feedback in 

multiple-choice testing upon learning* Journal of Educational 

Research , 69, 202-205. 



23 

o 

ERIC 



State-Wide Assessment 

23 



Kulhavy, R. W. , Dyer, J. W. , & Silver, L. (1975). The affects of 

notetaking and test ejrpectancy on the learning of text material. 

Journal of Educational Research , 68, 363-365. 
Lamhert, B. R. (1980-81). Teacher attitudes on testing: A 

multiple perspective. College Board Review , Winter, 13-14 and 

29-30, 

Madaus, G. F. (1985). Public policy and the testing profession: 

You've never had it so good? Educational Measurement Issues 

and Practice . 4, 5-11. 
Nungester, R. J., & Duchastel, P. C. (1982). Testing /ersus 

review: Effects on retention. Journal of Educational Psychology 

74, 18-22, 

Peckham, P. D. , a Roe, M, D. (1977). The effects of frequent 

testing. Journal of Research and Development in Education , 10, 
40-50. 

Rogers, B. G. (1985). Prospective teacher perceptions of how 
classroom teachers use evaluation methods: A qualitative 
research approach. Mid-Western Educational Research , 6, 13-20. 

Shaha, S. (1984). Matchlng-tests : Reduced anxiety and increased 
test effectiveness. Educatio nal and P sychological Measurement , 
869-881 • 

Stewart, L. G. , Si V^ite, M. A. (1976). Teacher comments, letter 
grades, and student performance: Wiat do we really know? 
Journal of Educational Psychology , 68 , 488-500. 



24 



State-Wide Assessment 

24 



Wexleyj K. N. , & Thornton, C. L. (1972). Effect of verbal 

feedback of test resultB upon learning. Journal o^f Educational 
Research , 66, 119-121. 

l/A 

4/16/87 



25 



State-Wide Assessment 

25 



Table 1 

Principals and Stiparyisors' Estimates of the Needs and Proficleneles of Beginning leactiers In 
17 Test Development Competency Arsas 





Test Bevelopfflent GompBtencies 


Nbed 


Proficiency 


Discrepancy 


Bank* 


t 


£ 


1. 


Witting nailtiple ehoice ItemB 


3,83 


2.99 


,84 


12 


19,53 


,001 


2. 


Writing eoffipletlon items 


3.91 


3,06 


,85 


11 


19.75 


.001 


3. 


Writing matching Itemi 


3,70 


3.10 


,60 


15 


13.73 


.001 


4, 


Writing trtie/fslse itemi 


3,51 


2,99 


,62 


14 


10,68 


,001 


Sa, 


Wfiting essay items 


4.27 


2,74 


1,53 


5.5 


32.29 


.001 


Sb. 


ScDring eseay items 


4,35 


2.67 


1,68 


3 


36,06 


,001 


6. 


Identifying good and poor items 


4*34 


2.83 




/ 






7* 


Items harmony achool/ciass goals 


4,33 


2.79 


1.54 


4 


34.12 


,001 


8. 


Stating clear /measurable object Ives 


4.40 


2.87 


1,53 


5,5 


33,26 


,001 


9, 


It^a meaeure hl^er thinking 


4.45 


2.55 


1.90 


1 


38.29 


.001 


10. 


items measure true progress 


4.50 


2.78 


1.72 


2 


38.39 


,001 


11* 


Use less formal assessments 


3.61 


2,86 


,75 


13 


15,95 


,001 


12, 


Use observation assessments 


4.02 


2.96 


1,06 


9,5 


24.14 


.001 


13* 


Use soeloijetric type assessments 


3.19 


2,72 


,47 


16,5 


10,70 


,001 


14, 


Selecting items from mantmls 


3. SO 


3.13 


.47 


16,5 


11,24 


,001 


15. 


Attractive test fo^at 


4.08 


3,02 


1,06 


9.5 


24,46 


.001 


16. 


Test coverage of ter.t and class 


4.51 


3.19 


1.32 


8 


32.18 


,001 




Combined items totals 


68.68 


49,23 











t-ratio 38,70 
Probability level ,001 



*Rank ordered by magnitude of discrepancy 



26 

o 

ERIC 



State-Wide Assessment 

26 



Table 2 



Principals and Supervlaorg' EgtiBaates 
Test Score Use Goropetency Areas 


of the Need J 


i and Proficiences of Beginning Teache 


ra in 


Nine 


Test Score Use Coropetenclee 


Need 


Proficiency 


Dlecrepaney 


Rank* 


t 




1, Calculation y means, SD'Sj 














reliability 


3*04 


2*42 


*62 


9 


12*97 


.001 


2, Interpreting scoree and student 














progress 


4*24 


2*88 


1,36 


5 




, UUl 


3, Identifying individual /class 














strength /weaknass 


4,33 


2.95 


1.38 


4 


35,27 


*001 


4, Deterffiining rateachlng needs 


4*55 


2*88 


1*67 


1 


36,79 


,001 


5, Use of testa and grades to 














influence learning 


4*31 


2,86 


1,45 


3 


31, 75 


*001 


6, Calculating end of tern grades 


3.98 


3*34 


*64 


8 


15,46 


,001 


7, Grading tests, papers^ ece. 


4*12 


3*41 


,71 


7 


17*86 


.001 


8, Deciding importance tests , papers. 














etc* 


4*20 


3*15 


1*05 


6 


25*05 


*001 


f* Deris/ing irformation tests /guide 














learning 


4,38 


2.90 


1*48 


2 


33*80 


.001 


Coinblned Items totals 


37.28 













t-ratio 37.84 
Probability level 0.000 



*Pank ordered by mapiitude of discrepancy 



ERIC 



27 



State=Wide Assessment 

27 



Table 3 

Beglmilng Teaehars* Taiting Profieiencles Compared to Their Other Proficlencess As Eetimated by 
Principal i and Supervisors 



Relative Profleteney Rating Items* 
1. Relative to knowledge of their subject 
areas, beglnnlni teasers' test and 
evaluation conpetenoiBS are**. 



Prineipal iupc 'jvlser Total t** 



3,03 



2*87 



2.95 



2*47 



*014 



2. Relative to their oAer professional 

education aompetencles^ such as planning^ 
discipline, etc., beginning teachers^ 
test and evaluation competencies are.** 



2.96 



2*81 



2.89 



2*34 



*020 



3* Relative to their overall competencies as 
educators, beginning teachers* test and 
evaluation competencies are... 



2.92 



2*73 



2.84 



3.34 



.001 



*Ratlngs were recorded via a five point Llkert-type scale, 5 (well above average), 4 (somewhat 
above average)^ 3 (above average) ^ 2 (somewhat below average), and 1 (much below average) 



[ean comparisons between principal and supervieor ratings 



28 

o 

ERIC 



State-Wide Assessmeiit 
28 



Beginning Teaehers' Test Development Gompeteney Need and Proficiency lleansj Ae Ratid by Supervieors and 
Frlneipals 



Test DevelopEBent Coropatenclas 

1, Writing multiple choice Items 

2, Writing completion itemi 
3i Writing natclilng Items 
4. Writing true/false Items 
5a • Writing essay Items 

Sb,* Scoring essay items 

6. Identifying goad and poor Items 

7. Xtems hai^ony school /class goals 

8 a Stating clear /measurable objectlvea 
9 « Items measure higher thinking 

10. Items measure true progress 

11. Use less formal asssssments 

12. Use observation assessments 

13. Use soclometrie type assessments 
14« Selecting items from manuals 
15* Attractive test format 

16* Test coverage of text and class 
Combined items total 
t-ratlo 

Probability level 



Need Hating Means 



Proficiency 
Eating Means 



Prtn^ 


iuper. 


t 


E 


p£in* 


Super . 


t 


p 


3*79 


3,89 


1*32 


*198 


3,06 


2,91 


2*67 


,008 


3*90 


3.89 


0,24 


*810 


3,13 


2.97 


2,60 


.010 


3.73 


3,65 


1,01 


.312 


3,16 


3,Q4 


2*07 


.039 


J * J J 






* / 






£>■ * 3\J 


• 013 


4*20 


4.32 


l,6i 


*097 


2,85 


2,59 


3.69 


.001 


4*30 


4,40 


1.37 


.171 


2,71 


2,53 


3*42 


.001 


4.30 


4*35 


0.78 


,436 


2,92 


2,73 


2,98 


*003 


4*30 


4.35 


0.76 


,448 


2,88 


2,72 


2.19 


,029 


4.35 


4*42 


1*06 


,288 


2,97 


2*73 


3,34 


,.001 


4.39 


4,51 


1.85 


.065 


2,65 


2,43 


2*91 


.004 


4.47 


4*51 


0*55 


,581 


2.8§ 


2.65 


3.27 


,001 


3*54 


3*70 


1.98 


,048 


2.93 


2,79 


2,15 


,032 


3*9S 


4*08 


1*79 


,075 


3,02 


2*91 


1.62 


,106 


3.22 


3^13 


1*26 


.207 


2,73 


2*76 


0*41 


.680 


3.59 


3,57 


0,29 


.773 


3,16 


3,12 


0,66 


.511 


4.06 


4,08 


0.21 


*834 


3,05 


3,01 


0,64 


,523 


4*49 


4.53 


0,66 


,507 


3,24 


3,14 


1,53 


,127 


68*16 


68,63 






iO,48 


47,88 







0*61 
0*54 



3*34 
,001 



Spearman ^o*s Between 
Ranks of Means 



0,98 



0,92 



29 



ERIC 



Statg-\/i^3e Asiessment 

29 



Table 5 

Beginning Teachers^ lest Score Uga C 
Principals 

Tegt ScDre Use Conrpetencies 

1. Caiculation^ means ^ SD's, 
reliability 

2, Interpreting scores and student 
progress 

3- Identifying Individual /class 

s trengtli /we ^nes s 
4* Determining reteaehlng needs 
5. Use of tests and grades to 

Influence learning 
6* Calculating end of term grades 
7. Grading tests, paparSj etc. 
St Deciding Importance tests^ 

papers 9 etc- 
9- Deriving Information testa /guide 

learning 

Combined items total 
t -ratio 

Probability value 

Spearman Kho's Between 
Ranks of Means 



leteneyHsed and yroflciince Means, As 



Kfifd Rating jfeans 



Pfln. 




t 




3,02 


3^ 04 




,813 




A» 27 


0,96 


.337 






1.42 


*156 




U, M9 


0,97 


.335 


^,31 




0*16 


*875 




3.S7 


2,17 


,031 


^.13 




0,96 


,336 






2.07 


.039 






0»O9 


*930 


37.30 









0,33 
.569 



Bat^d ty Supervisors and by 
^Proficiency 



l^R^ting Meani 





iuper . 

= — — ^ 


t 


p 




2*38 


0.75 


,455 




2,77 


2.51 


.012 




2.88 


1,85 


,06S 


S,96 


2,80 


2*0S 


*04O 


!a,92 


2.79 


1.90 


.05B 




3*30 


1*50 


,13A- 


3S *47 


3.36 


1*72 


*085 


3^ „ 26 


3.04 


3,64 


,OQL 




2*83 


2,21 


,022 



ar^ ,52 26.32 
S.47 
.020 



0.98 



30 



ERIC 



